HEAD ======= >>>>>>> acc04d98d5bfe4ca322513e908756d502a7488a5
<<<<<<< HEADThis research aims to determine if the postoperative function of a total knee replacement recipient can be modeled based on anthropomorphic data and details of the surgical procedure. We decided to carry out testing on models with two postoperative scoring systems as the response variable. The first score is the “Surgeon’s Score” which measures more objective postoperative metrics about the condition of the replaced knee following surgery. These metrics include: how straight or how flexed the patient can make thier knee, how well aligned the knee appears when inspecting from a frontal perspective and how much laxity or separation can be achieved by placing the lower leg under tension. Finally, the patient is asked about the overall pain they experience with the knee. The resulting score is a value between 0-100 with 80-100 being an excellent result.
The second score we model is the “Patient’s Score” which is generated by three questions: how far are they able to walk, how well can they navigate stairs and do they require any walking aids (crutches, cane, etc).
We hope to create a model that can predict each of the scores based on a variety of categorical and numeric predictors.
This data has been collected from a Southern California Hospital in collaboration with the Shiley Center for Orthopedic Research. The data is used for both clinical research and postoperative monitoring of implant survivorship. All patient specific information has been removed from the data so it adheres to the Health Insurance Portability and Accountability Act (HIPAA).
We hope to determine the impact a particular surgeon or the type of implants used has on the patient’s outcome after surgery. We are also interested in exploring the potential relationship between anthropomorphic factors and the patient’s postoperative satisfaction. Finally, we would like to understand if the considerable expense of computer navigation in knee replacement leads to better postoperative scores. Nevertheless, we keep our option opened to investigate other aspects that were not ‘visibile’ or never thought about it but might reveal themselfs as we move on with the project.
Here we may consider KneeScore_Surgeon or KneeScore_Patient as response and Age, Gendar, Weight, Height, BMI, Race, Side, Manufacturer as predictors. Gendar, Race, Side and Manufacturer are some of the categorical predictors and Age, Height, Weight and BMI are numerical predictors.
knee = read.csv('knees.csv')
knee$Surgeon = as.factor(knee$Surgeon)
knee$Year = as.factor(knee$Year)
plots = function(model) {
par(mfcol=c(2,2))
qqnorm(resid(model), col = red, pch = 16)
qqline(resid(model), col = lightblue, lwd = 2)
plot(fitted(model), resid(model), xlab = 'Fitted Values', ylab = 'Residuals', main = 'Fitted vs. Residuals Plot', col = purple, pch = 16)
abline(h = 0, col = green, lwd = 2)
hist(resid(model), col = blue, main = 'Histogram of Residuals', xlab = 'Residuals')
}
evaluate = function(model){
# Run each of the tests
lambda = 1.000001 # this is a hack so the function will return a CrossValidation score - HatValues of 1.0 are causing a division by zero error.
shapiroTest = shapiro.test(resid(model))
bpTest = bptest(model)
rSquared = summary(model)$r.squared
adjRSquared = summary(model)$adj.r.squared
loocv = sqrt(mean((resid(model) / (lambda - hatvalues(model))) ^ 2))
large_hat = sum(hatvalues(model) > 2 * mean(hatvalues(model)))
large_resid = sum(rstandard(model)[abs(rstandard(model)) > 2]) / (length(rstandard(model) + lambda))
# Collect tests in dataframe
values = data.frame(Result = c(prettyNum(shapiroTest$p.value), prettyNum(bpTest$p.value), prettyNum(rSquared), prettyNum(adjRSquared),prettyNum(loocv), prettyNum(large_hat)))
rowNames = data.frame(Test = c('Shapiro Wilk pvalue', 'Breusch-Pagan pvalue', 'R Squared', 'Adj R Squared', 'Cross Validation', 'Large Hat Values'))
summary_output = cbind(rowNames, values)
show(summary_output)
plots(model)
}
removeInfluential = function(model, data){
large_lev_index = which(hatvalues(model) > 2 * mean(hatvalues(model)))
cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
rStandard = abs(rstandard(model))
rStandardIndex = which(rStandard > 2)
index_ = intersect(intersect(large_lev_index, cooks), rStandardIndex)
newData = data[-index_ , ]
return(newData)
}
IdentifyInfluential = function(model, data){
large_lev_index = which(hatvalues(model) > 2 * mean(hatvalues(model)))
cooks = which(cooks.distance(model) > 4 / length(cooks.distance(model)))
rStandard = abs(rstandard(model))
rStandardIndex = which(rStandard > 2)
index_ = intersect(intersect(large_lev_index, cooks), rStandardIndex)
return(index_)
}
removeLargeHatValues = function(model, data){
largeHat = which(hatvalues(model) > 2 * mean(hatvalues(model)))
newData = data[-largeHat, ]
return(newData)
}
prediction_accuracy = function(model, data, margin) {
upper = data + margin
lower = data - margin
prediction = predict(model, newdata = data)
return(sum(prediction < upper & lower < prediction)) / length(prediction)
}
Begin by fitting a full additive model as a baseline and evaluating.
full_additive_model = lm(KneeScore_Surgeon ~ ., data = knee)
evaluate(full_additive_model)
## Test Result
## 1 Shapiro Wilk pvalue 2.080142e-08
## 2 Breusch-Pagan pvalue 0.9192514
## 3 R Squared 0.2718411
## 4 Adj R Squared 0.2130606
## 5 Cross Validation 14.4126
## 6 Large Hat Values 177
The baseline model has many large leverage values, which, upon inspection may be coming from several of the implant fields. Perhaps removing the suspect implant predictors will help improve the model.
model_eight = lm(KneeScore_Surgeon ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = knee)
evaluate(model_eight)
## Test Result
## 1 Shapiro Wilk pvalue 3.989833e-07
## 2 Breusch-Pagan pvalue 0.1270351
## 3 R Squared 0.2374036
## 4 Adj R Squared 0.2084443
## 5 Cross Validation 14.34844
## 6 Large Hat Values 126
anova(model_eight, full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ (Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter) - Diagnosis - Race - TibialInsertType -
## FemoralComponentModel - PatellaModel - TibialTrayModel -
## TibialTraySize - FemoralComponentType
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1896 379610
## 2 1821 362468 75 17142 1.1483 0.1845
While removing the predictors doesn’t yeild a definatively better model at a reasonable \(\alpha\), it does increase Adjusted \(R^2\) and slightly improves the cross validation score.
Utilize a Bayesian Information Criterion approach and evaluate the selected model.
n = length(resid(full_additive_model))
model_back_bic = step(full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(model_back_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.169168e-05
## 2 Breusch-Pagan pvalue 0.0001309477
## 3 R Squared 0.1869122
## 4 Adj R Squared 0.1852562
## 5 Cross Validation 14.37628
## 6 Large Hat Values 87
anova(model_back_bic, full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Age + Side + KneeScore_Patient
=======
knee_model = lm(KneeScore_Surgeon ~ ., data = knee)
summary(knee_model)
##
## Call:
## lm(formula = KneeScore_Surgeon ~ ., data = knee)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.928 -8.832 -0.234 7.798 49.165
##
## Coefficients: (21 not defined because of singularities)
## Estimate
## (Intercept) 35.69897
## Surgeon2 -0.45208
## Surgeon3 -1.09966
## Surgeon4 -2.11622
## Surgeon5 -3.71883
## Year2011 1.77031
## Year2012 2.02319
## Year2013 -0.52894
## Year2014 3.04851
## Year2015 2.75161
## Year2016 4.20897
## Age 0.28554
## GenderMale 1.06471
## Weight 0.05721
## Height -0.21788
## BMI -0.38019
## DiagnosisInflammatory non-infection -9.50692
## DiagnosisInflammatory non-infection -13.35023
## DiagnosisOsteoarthritis 0.77431
## DiagnosisPaget's disease 18.53356
## DiagnosisPost-trauma -4.33767
## DiagnosisPsoriatic Arthritis -8.02204
## DiagnosisRheumatoid arthritis 7.64676
## DiagnosisRheumatoid arthritis, Inflammatory non-infection 0.39754
## RaceBlack or African American -4.68131
## RaceHispanic -8.44185
## RaceIndian Sub-Continent -19.33729
## RaceMid-East Arabian -2.16159
## RaceMultiple Races -2.87781
## RaceOther -9.18859
## RacePacific Islander -18.94367
## RaceUnknown -7.09720
## RaceWhite -7.75157
## SideLeft 9.63617
## SideRight 9.49269
## KneeScore_Patient 0.25771
## GPSYes 0.32214
## ManufacturerSmith & Nephew -3.85441
## ManufacturerStryker -7.10653
## ManufacturerZimmer 5.48892
## FemoralComponentModelGenesis II -3.62682
## FemoralComponentModelJourney 26.67707
## FemoralComponentModelLegion 1.03307
## FemoralComponentModelNatural Knee II -1.46555
## FemoralComponentModelNexgen 37.16568
## FemoralComponentModelNexgen Legacy 24.09579
## FemoralComponentModelNon-porous -0.44141
## FemoralComponentModelNonporous -21.34760
## FemoralComponentModelPersona -9.23130
## FemoralComponentModelPFC Sigma -12.85311
## FemoralComponentModelSigma -13.79375
## FemoralComponentModelSigma TC3 -39.89483
## FemoralComponentModelTriathlon 4.45165
## FemoralComponentModelTriathlon Total Stabilizer -15.94065
## FemoralComponentModelZimmer Precoat 27.95585
## FemoralComponentSize10 4.83795
## FemoralComponentSize12 -12.15381
## FemoralComponentSize2 2.25157
## FemoralComponentSize2.5 2.33940
## FemoralComponentSize3 4.99422
## FemoralComponentSize3N 14.71634
## FemoralComponentSize4 4.10799
## FemoralComponentSize4N 6.15288
## FemoralComponentSize5 3.40948
## FemoralComponentSize5N 4.87539
## FemoralComponentSize6 1.73559
## FemoralComponentSize6n 8.34940
## FemoralComponentSize6N 4.61078
## FemoralComponentSize7 -0.64904
## FemoralComponentSize7+ -1.16563
## FemoralComponentSize73 0.28220
## FemoralComponentSize8 1.13364
## FemoralComponentSize8+ -3.39821
## FemoralComponentSize9 -4.71932
## FemoralComponentSizeD -11.53379
## FemoralComponentSizeE 0.51213
## FemoralComponentSizeE - NA
## FemoralComponentSizeE- -41.42820
## FemoralComponentSizeF 0.36934
## FemoralComponentSizeG -29.00815
## FemoralComponentSizeH -22.80284
## FemoralComponentTypecruciate retaining -4.18209
## FemoralComponentTypeposterior stabilized -5.41915
## TibialTrayModelGenesis II -0.26612
## TibialTrayModelJourney NA
## TibialTrayModelLegion NA
## TibialTrayModelM.B.T DePuy 21.46956
## TibialTrayModelM.B.T. Revision 25.70339
## TibialTrayModelMBT Keel 16.37305
## TibialTrayModelModular -7.40977
## TibialTrayModelNatural-Knee II NA
## TibialTrayModelNexgen -23.12509
## TibialTrayModelNexgen Legacy NA
## TibialTrayModelPersona NA
## TibialTrayModelPFC Sigma -12.04015
## TibialTrayModelPFC Sigma,Modular NA
## TibialTrayModelTriathlon NA
## TibialTraySize10 -25.22325
## TibialTraySize2 -6.33588
## TibialTraySize2.5 -4.22621
## TibialTraySize3 -6.40416
## TibialTraySize3/4/2016 0:00 -14.20771
## TibialTraySize32 23.28200
## TibialTraySize4 -6.01119
## TibialTraySize5 -5.84794
## TibialTraySize5/6/2016 0:00 7.54123
## TibialTraySize6 -5.81389
## TibialTraySize7 -5.61272
## TibialTraySize8 -6.20706
## TibialTraySize9 -20.39510
## TibialTraySizeD -26.29187
## TibialTraySizeF -22.57134
## TibialTraySizeH NA
## TibialInsertModelAttune NA
## TibialInsertModelDurasul -22.37945
## TibialInsertModelGenesis II -1.77872
## TibialInsertModelGenesis II PS -8.34625
## TibialInsertModelJourney NA
## TibialInsertModelLegion -1.86924
## TibialInsertModelLegion CR XLPE 0.99616
## TibialInsertModelLegion XLPE NA
## TibialInsertModelNatural-Knee II NA
## TibialInsertModelNexgen -4.06310
## TibialInsertModelNexgen Legacy -24.83492
## TibialInsertModelPersona NA
## TibialInsertModelPFC Sigma 28.53028
## TibialInsertModelPFC Sigma Cross-Linked 30.20885
## TibialInsertModelPFC Sigma,Modular NA
## TibialInsertModelProlong NA
## TibialInsertModelRotating Platform NA
## TibialInsertModelScorpio -15.15883
## TibialInsertModelSigma 18.16374
## TibialInsertModelTriathlon -0.21398
## TibialInsertModelTriathlon X3 3.96823
## TibialInsertModelTriathlon X3 Total Stabilizer NA
## TibialInsertWidth -0.07218
## TibialInsertTypeCS (Condylar Stabilizing) 3.63316
## TibialInsertTypecurved 3.31718
## TibialInsertTypecurved,Fixed bearing 2.40896
## TibialInsertTypedished 0.71376
## TibialInsertTypeFixed bearing 5.32072
## TibialInsertTypeFixed bearing,curved 1.73647
## TibialInsertTypeFixed bearing,posterior stabilized -5.91112
## TibialInsertTypehigh flex 2.81563
## TibialInsertTypehigh flex,curved -15.41864
## TibialInsertTypehigh flex,Fixed bearing -1.96367
## TibialInsertTypelipped,Fixed bearing 10.80613
## TibialInsertTypemobile bearing 2.98318
## TibialInsertTypemobile bearing,posterior stabilized -9.80464
## TibialInsertTypeposterior stabilized 4.52890
## TibialInsertTypeposterior stabilized,curved 12.18708
## TibialInsertTypeposterior stabilized,curved,Fixed bearing 3.28908
## TibialInsertTypeposterior stabilized,Fixed bearing 0.05072
## TibialInsertTypestandard 6.39496
## TibialInsertTypestandard,Fixed bearing 1.27742
## TibialInsertTypeTS (Total Stabilizer) 1.13084
## TibialInsertTypeTS (Total Stabilizer),Fixed bearing -9.14234
## PatellaModelAttune NA
## PatellaModelGenesis II 5.01862
## PatellaModelNatural-Knee II NA
## PatellaModelNexGen 4.21824
## PatellaModelNexGen Legacy -2.87566
## PatellaModelOval Dome Patella 3-peg -3.80728
## PatellaModelPatella Round Dome -6.95292
## PatellaModelPersona NA
## PatellaModelPFC Sigma -0.71954
## PatellaModelTriathlon -2.19911
## PatellaModelTriathlon Asymmetric Patella 0.05584
## PatellaDiameter 0.05812
## Std. Error
## (Intercept) 32.38422
## Surgeon2 1.94491
## Surgeon3 1.84162
## Surgeon4 1.74183
## Surgeon5 1.97887
## Year2011 1.39996
## Year2012 1.38108
## Year2013 1.39028
## Year2014 1.59422
## Year2015 1.80446
## Year2016 2.93910
## Age 0.04184
## GenderMale 1.18194
## Weight 0.07240
## Height 0.44334
## BMI 0.45608
## DiagnosisInflammatory non-infection 9.13640
## DiagnosisInflammatory non-infection 10.21111
## DiagnosisOsteoarthritis 5.50850
## DiagnosisPaget's disease 15.40286
## DiagnosisPost-trauma 6.11385
## DiagnosisPsoriatic Arthritis 15.34624
## DiagnosisRheumatoid arthritis 6.41512
## DiagnosisRheumatoid arthritis, Inflammatory non-infection 9.94864
## RaceBlack or African American 4.42316
## RaceHispanic 3.81954
## RaceIndian Sub-Continent 10.71280
## RaceMid-East Arabian 10.59663
## RaceMultiple Races 10.64078
## RaceOther 4.62764
## RacePacific Islander 14.60124
## RaceUnknown 4.95087
## RaceWhite 3.28429
## SideLeft 0.87505
## SideRight 0.85947
## KneeScore_Patient 0.01891
## GPSYes 1.03986
## ManufacturerSmith & Nephew 8.37257
## ManufacturerStryker 7.39101
## ManufacturerZimmer 10.94542
## FemoralComponentModelGenesis II 14.30536
## FemoralComponentModelJourney 18.59560
## FemoralComponentModelLegion 13.63817
## FemoralComponentModelNatural Knee II 21.05376
## FemoralComponentModelNexgen 27.74342
## FemoralComponentModelNexgen Legacy 28.32308
## FemoralComponentModelNon-porous 23.67199
## FemoralComponentModelNonporous 23.28013
## FemoralComponentModelPersona 10.30303
## FemoralComponentModelPFC Sigma 19.73980
## FemoralComponentModelSigma 18.30815
## FemoralComponentModelSigma TC3 23.50602
## FemoralComponentModelTriathlon 11.80054
## FemoralComponentModelTriathlon Total Stabilizer 14.39678
## FemoralComponentModelZimmer Precoat 27.52343
## FemoralComponentSize10 14.93298
## FemoralComponentSize12 16.77194
## FemoralComponentSize2 8.49888
## FemoralComponentSize2.5 9.02211
## FemoralComponentSize3 8.33040
## FemoralComponentSize3N 11.71841
## FemoralComponentSize4 8.36042
## FemoralComponentSize4N 8.53076
## FemoralComponentSize5 8.42169
## FemoralComponentSize5N 8.63502
## FemoralComponentSize6 8.52588
## FemoralComponentSize6n 13.27649
## FemoralComponentSize6N 8.84450
## FemoralComponentSize7 8.67281
## FemoralComponentSize7+ 16.66054
## FemoralComponentSize73 16.71433
## FemoralComponentSize8 8.96390
## FemoralComponentSize8+ 16.70852
## FemoralComponentSize9 11.03906
## FemoralComponentSizeD 16.13281
## FemoralComponentSizeE 15.05931
## FemoralComponentSizeE - NA
## FemoralComponentSizeE- 20.99607
## FemoralComponentSizeF 16.01777
## FemoralComponentSizeG 19.46720
## FemoralComponentSizeH 28.87886
## FemoralComponentTypecruciate retaining 8.81170
## FemoralComponentTypeposterior stabilized 8.81335
## TibialTrayModelGenesis II 5.19401
## TibialTrayModelJourney NA
## TibialTrayModelLegion NA
## TibialTrayModelM.B.T DePuy 18.80335
## TibialTrayModelM.B.T. Revision 21.91736
## TibialTrayModelMBT Keel 18.18146
## TibialTrayModelModular 27.21795
## TibialTrayModelNatural-Knee II NA
## TibialTrayModelNexgen 31.68197
## TibialTrayModelNexgen Legacy NA
## TibialTrayModelPersona NA
## TibialTrayModelPFC Sigma 22.91731
## TibialTrayModelPFC Sigma,Modular NA
## TibialTrayModelTriathlon NA
## TibialTraySize10 15.52482
## TibialTraySize2 10.51634
## TibialTraySize2.5 10.86921
## TibialTraySize3 10.58328
## TibialTraySize3/4/2016 0:00 13.50168
## TibialTraySize32 14.98073
## TibialTraySize4 10.60363
## TibialTraySize5 10.68184
## TibialTraySize5/6/2016 0:00 17.81406
## TibialTraySize6 10.79266
## TibialTraySize7 10.91583
## TibialTraySize8 11.19861
## TibialTraySize9 13.37900
## TibialTraySizeD 18.23804
## TibialTraySizeF 18.39743
## TibialTraySizeH NA
## TibialInsertModelAttune NA
## TibialInsertModelDurasul 22.01654
## TibialInsertModelGenesis II 5.03691
## TibialInsertModelGenesis II PS 7.03771
## TibialInsertModelJourney NA
## TibialInsertModelLegion 5.07040
## TibialInsertModelLegion CR XLPE 5.21939
## TibialInsertModelLegion XLPE NA
## TibialInsertModelNatural-Knee II NA
## TibialInsertModelNexgen 12.55582
## TibialInsertModelNexgen Legacy 16.41055
## TibialInsertModelPersona NA
## TibialInsertModelPFC Sigma 14.47970
## TibialInsertModelPFC Sigma Cross-Linked 14.84704
## TibialInsertModelPFC Sigma,Modular NA
## TibialInsertModelProlong NA
## TibialInsertModelRotating Platform NA
## TibialInsertModelScorpio 15.23609
## TibialInsertModelSigma 16.04480
## TibialInsertModelTriathlon 4.82877
## TibialInsertModelTriathlon X3 4.96113
## TibialInsertModelTriathlon X3 Total Stabilizer NA
## TibialInsertWidth 0.20280
## TibialInsertTypeCS (Condylar Stabilizing) 7.52301
## TibialInsertTypecurved 8.12192
## TibialInsertTypecurved,Fixed bearing 7.89182
## TibialInsertTypedished 8.00652
## TibialInsertTypeFixed bearing 7.85434
## TibialInsertTypeFixed bearing,curved 13.13132
## TibialInsertTypeFixed bearing,posterior stabilized 13.56780
## TibialInsertTypehigh flex 7.34540
## TibialInsertTypehigh flex,curved 12.63351
## TibialInsertTypehigh flex,Fixed bearing 17.30055
## TibialInsertTypelipped,Fixed bearing 16.92708
## TibialInsertTypemobile bearing 9.08744
## TibialInsertTypemobile bearing,posterior stabilized 14.31335
## TibialInsertTypeposterior stabilized 7.53633
## TibialInsertTypeposterior stabilized,curved 16.24873
## TibialInsertTypeposterior stabilized,curved,Fixed bearing 16.93149
## TibialInsertTypeposterior stabilized,Fixed bearing 8.52750
## TibialInsertTypestandard 7.48079
## TibialInsertTypestandard,Fixed bearing 17.83097
## TibialInsertTypeTS (Total Stabilizer) 8.46987
## TibialInsertTypeTS (Total Stabilizer),Fixed bearing 16.23051
## PatellaModelAttune NA
## PatellaModelGenesis II 7.53765
## PatellaModelNatural-Knee II NA
## PatellaModelNexGen 32.17471
## PatellaModelNexGen Legacy 11.24018
## PatellaModelOval Dome Patella 3-peg 3.54140
## PatellaModelPatella Round Dome 4.56766
## PatellaModelPersona NA
## PatellaModelPFC Sigma 15.84998
## PatellaModelTriathlon 7.95086
## PatellaModelTriathlon Asymmetric Patella 6.93708
## PatellaDiameter 0.06319
## t value
## (Intercept) 1.102
## Surgeon2 -0.232
## Surgeon3 -0.597
## Surgeon4 -1.215
## Surgeon5 -1.879
## Year2011 1.265
## Year2012 1.465
## Year2013 -0.380
## Year2014 1.912
## Year2015 1.525
## Year2016 1.432
## Age 6.825
## GenderMale 0.901
## Weight 0.790
## Height -0.491
## BMI -0.834
## DiagnosisInflammatory non-infection -1.041
## DiagnosisInflammatory non-infection -1.307
## DiagnosisOsteoarthritis 0.141
## DiagnosisPaget's disease 1.203
## DiagnosisPost-trauma -0.709
## DiagnosisPsoriatic Arthritis -0.523
## DiagnosisRheumatoid arthritis 1.192
## DiagnosisRheumatoid arthritis, Inflammatory non-infection 0.040
## RaceBlack or African American -1.058
## RaceHispanic -2.210
## RaceIndian Sub-Continent -1.805
## RaceMid-East Arabian -0.204
## RaceMultiple Races -0.270
## RaceOther -1.986
## RacePacific Islander -1.297
## RaceUnknown -1.434
## RaceWhite -2.360
## SideLeft 11.012
## SideRight 11.045
## KneeScore_Patient 13.627
## GPSYes 0.310
## ManufacturerSmith & Nephew -0.460
## ManufacturerStryker -0.962
## ManufacturerZimmer 0.501
## FemoralComponentModelGenesis II -0.254
## FemoralComponentModelJourney 1.435
## FemoralComponentModelLegion 0.076
## FemoralComponentModelNatural Knee II -0.070
## FemoralComponentModelNexgen 1.340
## FemoralComponentModelNexgen Legacy 0.851
## FemoralComponentModelNon-porous -0.019
## FemoralComponentModelNonporous -0.917
## FemoralComponentModelPersona -0.896
## FemoralComponentModelPFC Sigma -0.651
## FemoralComponentModelSigma -0.753
## FemoralComponentModelSigma TC3 -1.697
## FemoralComponentModelTriathlon 0.377
## FemoralComponentModelTriathlon Total Stabilizer -1.107
## FemoralComponentModelZimmer Precoat 1.016
## FemoralComponentSize10 0.324
## FemoralComponentSize12 -0.725
## FemoralComponentSize2 0.265
## FemoralComponentSize2.5 0.259
## FemoralComponentSize3 0.600
## FemoralComponentSize3N 1.256
## FemoralComponentSize4 0.491
## FemoralComponentSize4N 0.721
## FemoralComponentSize5 0.405
## FemoralComponentSize5N 0.565
## FemoralComponentSize6 0.204
## FemoralComponentSize6n 0.629
## FemoralComponentSize6N 0.521
## FemoralComponentSize7 -0.075
## FemoralComponentSize7+ -0.070
## FemoralComponentSize73 0.017
## FemoralComponentSize8 0.126
## FemoralComponentSize8+ -0.203
## FemoralComponentSize9 -0.428
## FemoralComponentSizeD -0.715
## FemoralComponentSizeE 0.034
## FemoralComponentSizeE - NA
## FemoralComponentSizeE- -1.973
## FemoralComponentSizeF 0.023
## FemoralComponentSizeG -1.490
## FemoralComponentSizeH -0.790
## FemoralComponentTypecruciate retaining -0.475
## FemoralComponentTypeposterior stabilized -0.615
## TibialTrayModelGenesis II -0.051
## TibialTrayModelJourney NA
## TibialTrayModelLegion NA
## TibialTrayModelM.B.T DePuy 1.142
## TibialTrayModelM.B.T. Revision 1.173
## TibialTrayModelMBT Keel 0.901
## TibialTrayModelModular -0.272
## TibialTrayModelNatural-Knee II NA
## TibialTrayModelNexgen -0.730
## TibialTrayModelNexgen Legacy NA
## TibialTrayModelPersona NA
## TibialTrayModelPFC Sigma -0.525
## TibialTrayModelPFC Sigma,Modular NA
## TibialTrayModelTriathlon NA
## TibialTraySize10 -1.625
## TibialTraySize2 -0.602
## TibialTraySize2.5 -0.389
## TibialTraySize3 -0.605
## TibialTraySize3/4/2016 0:00 -1.052
## TibialTraySize32 1.554
## TibialTraySize4 -0.567
## TibialTraySize5 -0.547
## TibialTraySize5/6/2016 0:00 0.423
## TibialTraySize6 -0.539
## TibialTraySize7 -0.514
## TibialTraySize8 -0.554
## TibialTraySize9 -1.524
## TibialTraySizeD -1.442
## TibialTraySizeF -1.227
## TibialTraySizeH NA
## TibialInsertModelAttune NA
## TibialInsertModelDurasul -1.016
## TibialInsertModelGenesis II -0.353
## TibialInsertModelGenesis II PS -1.186
## TibialInsertModelJourney NA
## TibialInsertModelLegion -0.369
## TibialInsertModelLegion CR XLPE 0.191
## TibialInsertModelLegion XLPE NA
## TibialInsertModelNatural-Knee II NA
## TibialInsertModelNexgen -0.324
## TibialInsertModelNexgen Legacy -1.513
## TibialInsertModelPersona NA
## TibialInsertModelPFC Sigma 1.970
## TibialInsertModelPFC Sigma Cross-Linked 2.035
## TibialInsertModelPFC Sigma,Modular NA
## TibialInsertModelProlong NA
## TibialInsertModelRotating Platform NA
## TibialInsertModelScorpio -0.995
## TibialInsertModelSigma 1.132
## TibialInsertModelTriathlon -0.044
## TibialInsertModelTriathlon X3 0.800
## TibialInsertModelTriathlon X3 Total Stabilizer NA
## TibialInsertWidth -0.356
## TibialInsertTypeCS (Condylar Stabilizing) 0.483
## TibialInsertTypecurved 0.408
## TibialInsertTypecurved,Fixed bearing 0.305
## TibialInsertTypedished 0.089
## TibialInsertTypeFixed bearing 0.677
## TibialInsertTypeFixed bearing,curved 0.132
## TibialInsertTypeFixed bearing,posterior stabilized -0.436
## TibialInsertTypehigh flex 0.383
## TibialInsertTypehigh flex,curved -1.220
## TibialInsertTypehigh flex,Fixed bearing -0.114
## TibialInsertTypelipped,Fixed bearing 0.638
## TibialInsertTypemobile bearing 0.328
## TibialInsertTypemobile bearing,posterior stabilized -0.685
## TibialInsertTypeposterior stabilized 0.601
## TibialInsertTypeposterior stabilized,curved 0.750
## TibialInsertTypeposterior stabilized,curved,Fixed bearing 0.194
## TibialInsertTypeposterior stabilized,Fixed bearing 0.006
## TibialInsertTypestandard 0.855
## TibialInsertTypestandard,Fixed bearing 0.072
## TibialInsertTypeTS (Total Stabilizer) 0.134
## TibialInsertTypeTS (Total Stabilizer),Fixed bearing -0.563
## PatellaModelAttune NA
## PatellaModelGenesis II 0.666
## PatellaModelNatural-Knee II NA
## PatellaModelNexGen 0.131
## PatellaModelNexGen Legacy -0.256
## PatellaModelOval Dome Patella 3-peg -1.075
## PatellaModelPatella Round Dome -1.522
## PatellaModelPersona NA
## PatellaModelPFC Sigma -0.045
## PatellaModelTriathlon -0.277
## PatellaModelTriathlon Asymmetric Patella 0.008
## PatellaDiameter 0.920
## Pr(>|t|)
## (Intercept) 0.2705
## Surgeon2 0.8162
## Surgeon3 0.5505
## Surgeon4 0.2245
## Surgeon5 0.0604 .
## Year2011 0.2062
## Year2012 0.1431
## Year2013 0.7037
## Year2014 0.0560 .
## Year2015 0.1275
## Year2016 0.1523
## Age 1.19e-11 ***
## GenderMale 0.3678
## Weight 0.4295
## Height 0.6232
## BMI 0.4046
## DiagnosisInflammatory non-infection 0.2982
## DiagnosisInflammatory non-infection 0.1912
## DiagnosisOsteoarthritis 0.8882
## DiagnosisPaget's disease 0.2290
## DiagnosisPost-trauma 0.4781
## DiagnosisPsoriatic Arthritis 0.6012
## DiagnosisRheumatoid arthritis 0.2334
## DiagnosisRheumatoid arthritis, Inflammatory non-infection 0.9681
## RaceBlack or African American 0.2900
## RaceHispanic 0.0272 *
## RaceIndian Sub-Continent 0.0712 .
## RaceMid-East Arabian 0.8384
## RaceMultiple Races 0.7868
## RaceOther 0.0472 *
## RacePacific Islander 0.1947
## RaceUnknown 0.1519
## RaceWhite 0.0184 *
## SideLeft < 2e-16 ***
## SideRight < 2e-16 ***
## KneeScore_Patient < 2e-16 ***
## GPSYes 0.7568
## ManufacturerSmith & Nephew 0.6453
## ManufacturerStryker 0.3364
## ManufacturerZimmer 0.6161
## FemoralComponentModelGenesis II 0.7999
## FemoralComponentModelJourney 0.1516
## FemoralComponentModelLegion 0.9396
## FemoralComponentModelNatural Knee II 0.9445
## FemoralComponentModelNexgen 0.1805
## FemoralComponentModelNexgen Legacy 0.3950
## FemoralComponentModelNon-porous 0.9851
## FemoralComponentModelNonporous 0.3593
## FemoralComponentModelPersona 0.3704
## FemoralComponentModelPFC Sigma 0.5150
## FemoralComponentModelSigma 0.4513
## FemoralComponentModelSigma TC3 0.0898 .
## FemoralComponentModelTriathlon 0.7060
## FemoralComponentModelTriathlon Total Stabilizer 0.2683
## FemoralComponentModelZimmer Precoat 0.3099
## FemoralComponentSize10 0.7460
## FemoralComponentSize12 0.4688
## FemoralComponentSize2 0.7911
## FemoralComponentSize2.5 0.7954
## FemoralComponentSize3 0.5489
## FemoralComponentSize3N 0.2093
## FemoralComponentSize4 0.6232
## FemoralComponentSize4N 0.4708
## FemoralComponentSize5 0.6856
## FemoralComponentSize5N 0.5724
## FemoralComponentSize6 0.8387
## FemoralComponentSize6n 0.5295
## FemoralComponentSize6N 0.6022
## FemoralComponentSize7 0.9404
## FemoralComponentSize7+ 0.9442
## FemoralComponentSize73 0.9865
## FemoralComponentSize8 0.8994
## FemoralComponentSize8+ 0.8389
## FemoralComponentSize9 0.6691
## FemoralComponentSizeD 0.4747
## FemoralComponentSizeE 0.9729
## FemoralComponentSizeE - NA
## FemoralComponentSizeE- 0.0486 *
## FemoralComponentSizeF 0.9816
## FemoralComponentSizeG 0.1364
## FemoralComponentSizeH 0.4299
## FemoralComponentTypecruciate retaining 0.6351
## FemoralComponentTypeposterior stabilized 0.5387
## TibialTrayModelGenesis II 0.9591
## TibialTrayModelJourney NA
## TibialTrayModelLegion NA
## TibialTrayModelM.B.T DePuy 0.2537
## TibialTrayModelM.B.T. Revision 0.2411
## TibialTrayModelMBT Keel 0.3680
## TibialTrayModelModular 0.7855
## TibialTrayModelNatural-Knee II NA
## TibialTrayModelNexgen 0.4655
## TibialTrayModelNexgen Legacy NA
## TibialTrayModelPersona NA
## TibialTrayModelPFC Sigma 0.5994
## TibialTrayModelPFC Sigma,Modular NA
## TibialTrayModelTriathlon NA
## TibialTraySize10 0.1044
## TibialTraySize2 0.5469
## TibialTraySize2.5 0.6975
## TibialTraySize3 0.5452
## TibialTraySize3/4/2016 0:00 0.2928
## TibialTraySize32 0.1203
## TibialTraySize4 0.5709
## TibialTraySize5 0.5841
## TibialTraySize5/6/2016 0:00 0.6721
## TibialTraySize6 0.5902
## TibialTraySize7 0.6072
## TibialTraySize8 0.5795
## TibialTraySize9 0.1276
## TibialTraySizeD 0.1496
## TibialTraySizeF 0.2200
## TibialTraySizeH NA
## TibialInsertModelAttune NA
## TibialInsertModelDurasul 0.3095
## TibialInsertModelGenesis II 0.7240
## TibialInsertModelGenesis II PS 0.2358
## TibialInsertModelJourney NA
## TibialInsertModelLegion 0.7124
## TibialInsertModelLegion CR XLPE 0.8487
## TibialInsertModelLegion XLPE NA
## TibialInsertModelNatural-Knee II NA
## TibialInsertModelNexgen 0.7463
## TibialInsertModelNexgen Legacy 0.1304
## TibialInsertModelPersona NA
## TibialInsertModelPFC Sigma 0.0489 *
## TibialInsertModelPFC Sigma Cross-Linked 0.0420 *
## TibialInsertModelPFC Sigma,Modular NA
## TibialInsertModelProlong NA
## TibialInsertModelRotating Platform NA
## TibialInsertModelScorpio 0.3199
## TibialInsertModelSigma 0.2578
## TibialInsertModelTriathlon 0.9647
## TibialInsertModelTriathlon X3 0.4239
## TibialInsertModelTriathlon X3 Total Stabilizer NA
## TibialInsertWidth 0.7219
## TibialInsertTypeCS (Condylar Stabilizing) 0.6292
## TibialInsertTypecurved 0.6830
## TibialInsertTypecurved,Fixed bearing 0.7602
## TibialInsertTypedished 0.9290
## TibialInsertTypeFixed bearing 0.4982
## TibialInsertTypeFixed bearing,curved 0.8948
## TibialInsertTypeFixed bearing,posterior stabilized 0.6631
## TibialInsertTypehigh flex 0.7015
## TibialInsertTypehigh flex,curved 0.2225
## TibialInsertTypehigh flex,Fixed bearing 0.9096
## TibialInsertTypelipped,Fixed bearing 0.5233
## TibialInsertTypemobile bearing 0.7427
## TibialInsertTypemobile bearing,posterior stabilized 0.4934
## TibialInsertTypeposterior stabilized 0.5480
## TibialInsertTypeposterior stabilized,curved 0.4533
## TibialInsertTypeposterior stabilized,curved,Fixed bearing 0.8460
## TibialInsertTypeposterior stabilized,Fixed bearing 0.9953
## TibialInsertTypestandard 0.3927
## TibialInsertTypestandard,Fixed bearing 0.9429
## TibialInsertTypeTS (Total Stabilizer) 0.8938
## TibialInsertTypeTS (Total Stabilizer),Fixed bearing 0.5733
## PatellaModelAttune NA
## PatellaModelGenesis II 0.5056
## PatellaModelNatural-Knee II NA
## PatellaModelNexGen 0.8957
## PatellaModelNexGen Legacy 0.7981
## PatellaModelOval Dome Patella 3-peg 0.2825
## PatellaModelPatella Round Dome 0.1281
## PatellaModelPersona NA
## PatellaModelPFC Sigma 0.9638
## PatellaModelTriathlon 0.7821
## PatellaModelTriathlon Asymmetric Patella 0.9936
## PatellaDiameter 0.3578
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.11 on 1821 degrees of freedom
## Multiple R-squared: 0.2718, Adjusted R-squared: 0.2131
## F-statistic: 4.625 on 147 and 1821 DF, p-value: < 2.2e-16
knee_model_aic = step(knee_model, direction = "backward", trace = 0)
summary(knee_model_aic)$call
## lm(formula = KneeScore_Surgeon ~ Surgeon + Year + Age + Side +
## KneeScore_Patient, data = knee)
knee_model_bic = step(knee_model, direction = "backward", trace = 0)
summary(knee_model_bic)$call
## lm(formula = KneeScore_Surgeon ~ Surgeon + Year + Age + Side +
## KneeScore_Patient, data = knee)
anova(knee_model_aic, knee_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Surgeon ~ Surgeon + Year + Age + Side + KneeScore_Patient
>>>>>>> acc04d98d5bfe4ca322513e908756d502a7488a5
## Model 2: KneeScore_Surgeon ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Patient +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
<<<<<<< HEAD
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1964 404744
## 2 1821 362468 143 42276 1.4853 0.000288 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The resulting BIC model scores marginally better in Cross Validation though the anova results clearly prefer the full additive model.
Next, we will evaluate the RMSE of the best performing models thus far.
options(warn=-1)
sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]
rmse = function(actual, predicted) { sqrt(mean((actual - predicted) ^ 2))}
full_test = rmse(test_data$KneeScore_Surgeon, predict(full_additive_model, test_data))
full_train = rmse(train_data$KneeScore_Surgeon, predict(full_additive_model, train_data))
test_mod_8 = rmse(test_data$KneeScore_Surgeon, predict(model_eight, test_data))
train_mod_8 = rmse(train_data$KneeScore_Surgeon, predict(model_eight, train_data))
bic_test = rmse(test_data$KneeScore_Surgeon, predict(model_back_bic, test_data))
bic_train = rmse(train_data$KneeScore_Surgeon, predict(model_back_bic, train_data))
Model
Train RMSE
Test RMSE
Model All+
13.5971116
13.4739255
Model 9
13.9817512
13.5709474
Model BIC
14.4076613
14.1099068
** how are these evalueated again?**
Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Leverage, Large Cook's Distance and Large RStandard values.
new_dataframe = removeInfluential(model_eight, knee)
model_nine = lm(KneeScore_Surgeon ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = new_dataframe)
evaluate(model_nine)
## Test Result
## 1 Shapiro Wilk pvalue 6.718107e-07
## 2 Breusch-Pagan pvalue 0.2390135
## 3 R Squared 0.241005
## 4 Adj R Squared 0.2124774
## 5 Cross Validation 14.17558
## 6 Large Hat Values 119

This does seem to be of moderate benefit in relation to Adjusted \(R^2\) and Cross Validation.
We will evaluate the same model one again after removing the identified large hat values.
remove_hat_dataframe = removeLargeHatValues(model_nine, new_dataframe)
model_ten = lm(KneeScore_Surgeon ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = remove_hat_dataframe)
evaluate(model_ten)
## Test Result
## 1 Shapiro Wilk pvalue 3.604033e-06
## 2 Breusch-Pagan pvalue 0.08198251
## 3 R Squared 0.2187277
## 4 Adj R Squared 0.2013757
## 5 Cross Validation 14.32124
## 6 Large Hat Values 93

Removing the large hat values decreased the Cross Validation score. Perhaps we can find a better model by running BIC on the modified dataframe.
n = length(resid(model_ten))
model_10_bic = step(model_ten, direction = "backward", k = log(n), trace = 0)
evaluate(model_10_bic)
## Test Result
## 1 Shapiro Wilk pvalue 0.0001065612
## 2 Breusch-Pagan pvalue 8.666681e-05
## 3 R Squared 0.1862472
## 4 Adj R Squared 0.1844753
## 5 Cross Validation 14.33378
## 6 Large Hat Values 79

Despite the reduction in leveraged values the selected model is no better than preceeding models.
sample_idx_two = sample(1:nrow(remove_hat_dataframe), 1300)
train_data_two = remove_hat_dataframe[sample_idx_two, ]
test_data_two = remove_hat_dataframe[-sample_idx_two, ]
test_mod_10 = rmse(test_data_two$KneeScore_Surgeon, predict(model_ten, train_data_two))
train_mod_10 = rmse(train_data_two$KneeScore_Surgeon, predict(model_ten, train_data_two))
test_10_bic = rmse(test_data_two$KneeScore_Surgeon, predict(model_10_bic, train_data_two))
train_10_bic = rmse(train_data_two$KneeScore_Surgeon, predict(model_10_bic, train_data_two))
Model
Train RMSE
Test RMSE
Model 10
13.7957748
17.5150555
Model 10 BIC
14.1082151
17.282833
model = lm(KneeScore_Surgeon ~. , data = knee)
model_back_aic = step(model, direction = "backward", trace = 0)
model_for_aic = step(model, direction = "forward", trace = 0)
model_both_aic = step(model, direction = "both", trace = 0)
n = length(resid(model))
model_back_bic = step(model, direction = "backward", k = log(n), trace = 0)
model_for_bic = step(model, direction = "forward", k = log(n), trace = 0)
model_both_bic = step(model, direction = "both", k = log(n), trace = 0)
evaluate(model)
## Test Result
## 1 Shapiro Wilk pvalue 2.080142e-08
## 2 Breusch-Pagan pvalue 0.9192514
## 3 R Squared 0.2718411
## 4 Adj R Squared 0.2130606
## 5 Cross Validation 14.4126
## 6 Large Hat Values 177
evaluate(model_back_aic)
## Test Result
## 1 Shapiro Wilk pvalue 1.360071e-06
## 2 Breusch-Pagan pvalue 0.0002559995
## 3 R Squared 0.2062052
## 4 Adj R Squared 0.2005178
## 5 Cross Validation 14.27567
## 6 Large Hat Values 36
evaluate(model_for_aic)
## Test Result
## 1 Shapiro Wilk pvalue 2.080142e-08
## 2 Breusch-Pagan pvalue 0.9192514
## 3 R Squared 0.2718411
## 4 Adj R Squared 0.2130606
## 5 Cross Validation 14.4126
## 6 Large Hat Values 177
evaluate(model_both_aic)
## Test Result
## 1 Shapiro Wilk pvalue 1.360071e-06
## 2 Breusch-Pagan pvalue 0.0002559995
## 3 R Squared 0.2062052
## 4 Adj R Squared 0.2005178
## 5 Cross Validation 14.27567
## 6 Large Hat Values 36
## Adj.R.Squared: 0.24 - CrossValidation: 16.9
evaluate(model_back_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.169168e-05
## 2 Breusch-Pagan pvalue 0.0001309477
## 3 R Squared 0.1869122
## 4 Adj R Squared 0.1852562
## 5 Cross Validation 14.37628
## 6 Large Hat Values 87
evaluate(model_for_bic)
## Test Result
## 1 Shapiro Wilk pvalue 2.080142e-08
## 2 Breusch-Pagan pvalue 0.9192514
## 3 R Squared 0.2718411
## 4 Adj R Squared 0.2130606
## 5 Cross Validation 14.4126
## 6 Large Hat Values 177
evaluate(model_both_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.169168e-05
## 2 Breusch-Pagan pvalue 0.0001309477
## 3 R Squared 0.1869122
## 4 Adj R Squared 0.1852562
## 5 Cross Validation 14.37628
## 6 Large Hat Values 87
## Remove large leverages
new_data = removeInfluential(model, knee)
model_without_large_leverages = lm(KneeScore_Surgeon ~ ., data = new_data)
evaluate(model_without_large_leverages)
## Test Result
## 1 Shapiro Wilk pvalue 1.780351e-08
## 2 Breusch-Pagan pvalue 0.976812
## 3 R Squared 0.2753609
## 4 Adj R Squared 0.2167679
## 5 Cross Validation 14.32084
## 6 Large Hat Values 179
Removing the records with large hat values and refitting simply introduces more large hat values. What is unique about these rows of data?
which(hatvalues(model) == 1.0)
## 17 19 74 240 244 258 354 533 638 703 763 844 876 881 891
## 17 19 74 240 244 258 354 533 638 703 763 844 876 881 891
## 1047 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927 1938
## 1047 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927 1938
print(knee[17,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race Side
## 17 3 2010 68 Female 148 64 25.43 Osteoarthritis White Left
## KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 17 48 50 No DePuy (J&J)
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 17 Nonporous 2 cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 17 PFC Sigma 2.5 PFC Sigma 8
## TibialInsertType PatellaModel PatellaDiameter
## 17 curved 3-Post Round Dome 32
print(knee[240,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race
## 240 3 2010 66 Female 130 68 19.78 Osteoarthritis White
## Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 240 Bilateral 60 80 No Zimmer
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 240 Zimmer Precoat E - cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 240 Nexgen 4 Nexgen 10
## TibialInsertType PatellaModel PatellaDiameter
## 240 standard NexGen 32
print(knee[638,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race Side
## 638 3 2012 67 Female 167 62 30.57 Osteoarthritis White Right
## KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 638 55 55 Yes DePuy (J&J)
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 638 Sigma 4N cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 638 PFC Sigma 3 PFC Sigma 8
## TibialInsertType PatellaModel PatellaDiameter
## 638 high flex,Fixed bearing Triathlon Asymmetric Patella 35
print(knee[1093,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race
## 1093 3 2013 83 Female 130 63 23.05 Osteoarthritis White
## Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 1093 Right 66 50 No DePuy (J&J)
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1093 Sigma 3 posterior stabilized
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1093 PFC Sigma 2.5 PFC Sigma 8
## TibialInsertType PatellaModel PatellaDiameter
## 1093 posterior stabilized,curved Oval Dome Patella 3-peg 35
print(knee[1775, ])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race Side
## 1775 3 2014 22 Male 240 77 28.49 Post-trauma White Right
## KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 1775 32 65 No Zimmer
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1775 Persona 12 cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1775 Persona H Persona 10
## TibialInsertType PatellaModel PatellaDiameter
## 1775 Fixed bearing Persona 38
Each of these has a suspect factor entry. Perhaps the entry only occurs infrequently in the database and is causing the entire row to be flagged as high leverage? Try pulling the suspect columns out of the LM and see what happens.
model_nine = lm(KneeScore_Surgeon ~ . - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - FemoralComponentType, data = knee)
evaluate(model_nine)
## Test Result
## 1 Shapiro Wilk pvalue 8.105873e-08
## 2 Breusch-Pagan pvalue 0.1679994
## 3 R Squared 0.2556416
## 4 Adj R Squared 0.2141109
## 5 Cross Validation 14.34666
## 6 Large Hat Values 138
plot(fitted(knee_model_aic), resid(knee_model_aic), col = "magenta", pch = 20, cex = 1.5, xlab = "Fitted", ylab = "Residuals")
abline(h=0, lwd = 2, col = "olivedrab1")
qqnorm(resid(knee_model_aic), main = "Normal QQ Plot - Knee Model AIC", col = "purple4", pch = 20, cex = 1.5)
qqline(resid(knee_model_aic), col = "Orange", lwd = 3)
library(lmtest)
bptest(knee_model_aic)
##
## studentized Breusch-Pagan test
##
## data: knee_model_aic
## BP = 39.99, df = 14, p-value = 0.000256
shapiro.test(resid(knee_model_aic))
##
## Shapiro-Wilk normality test
##
## data: resid(knee_model_aic)
## W = 0.99459, p-value = 1.36e-06
>>>>>>> acc04d98d5bfe4ca322513e908756d502a7488a5
Begin by fitting a full additive model as a baseline and evaluating.
full_additive_model = lm(KneeScore_Patient ~ ., data = knee)
evaluate(full_additive_model)
## Test Result
## 1 Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue 0.9539336
## 3 R Squared 0.3161338
## 4 Adj R Squared 0.2609288
## 5 Cross Validation 17.07365
## 6 Large Hat Values 178
<<<<<<< HEAD
The baseline model has many large leverage values, which, upon inspection may be coming from several of the implant fields. Perhaps removing the suspect implant predictors will help improve the model.
model_eight = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = knee)
evaluate(model_eight)
## Test Result
## 1 Shapiro Wilk pvalue 1.115521e-14
## 2 Breusch-Pagan pvalue 0.00453633
## 3 R Squared 0.2810507
## 4 Adj R Squared 0.2537489
## 5 Cross Validation 16.88212
## 6 Large Hat Values 126
<<<<<<< HEAD
anova(model_eight, full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Patient ~ (Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter) - Diagnosis - Race - TibialInsertType -
## FemoralComponentModel - PatellaModel - TibialTrayModel -
## TibialTraySize - FemoralComponentType
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1896 530943
## 2 1821 505034 75 25909 1.2456 0.07874 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
While removing the predictors doesn’t yeild a definatively better model at a reasonable \(\alpha\), it does increase Adjusted \(R^2\) and slightly improves the cross validation score.
Utilize a Bayesian Information Criterion approach and evaluate the selected model.
n = length(resid(full_additive_model))
model_back_bic = step(full_additive_model, direction = "backward", k = log(n), trace = 0)
evaluate(model_back_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.934109e-15
## 2 Breusch-Pagan pvalue 7.743929e-07
## 3 R Squared 0.2423439
## 4 Adj R Squared 0.2404141
## 5 Cross Validation 16.91176
## 6 Large Hat Values 112
anova(model_back_bic, full_additive_model)
## Analysis of Variance Table
##
## Model 1: KneeScore_Patient ~ Age + Gender + Weight + Height + KneeScore_Surgeon
## Model 2: KneeScore_Patient ~ Surgeon + Year + Age + Gender + Weight +
## Height + BMI + Diagnosis + Race + Side + KneeScore_Surgeon +
## GPS + Manufacturer + FemoralComponentModel + FemoralComponentSize +
## FemoralComponentType + TibialTrayModel + TibialTraySize +
## TibialInsertModel + TibialInsertWidth + TibialInsertType +
## PatellaModel + PatellaDiameter
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1963 559528
## 2 1821 505034 142 54494 1.3837 0.002592 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<<<<<<< HEAD
The resulting BIC model scores marginally better in Cross Validation though the anova results clearly prefer the full additive model.
Next, we will evaluate the RMSE of the best performing models thus far.
options(warn=-1)
sample_idx = sample(1:nrow(knee), 1500)
train_data = knee[sample_idx, ]
test_data = knee[-sample_idx, ]
rmse = function(actual, predicted) { sqrt(mean((actual - predicted) ^ 2))}
full_test = rmse(test_data$KneeScore_Patient, predict(full_additive_model, test_data))
full_train = rmse(train_data$KneeScore_Patient, predict(full_additive_model, train_data))
test_mod_8 = rmse(test_data$KneeScore_Patient, predict(model_eight, test_data))
train_mod_8 = rmse(train_data$KneeScore_Patient, predict(model_eight, train_data))
bic_test = rmse(test_data$KneeScore_Patient, predict(model_back_bic, test_data))
bic_train = rmse(train_data$KneeScore_Patient, predict(model_back_bic, train_data))
| Model | Train RMSE | Test RMSE | ||
|---|---|---|---|---|
Model All+ |
<<<<<<< HEAD
15.9136983 | 16.3363653 | ||
Model 9 |
16.1952365 | 17.1232936 | ||
Model BIC |
16.6211189 | 17.5913828 | =======16.1325071 | 15.6349101 |
Model 9 |
16.4395967 | 16.3615981 | ||
Model BIC |
16.8626588 | 16.8401307 | >>>>>>> acc04d98d5bfe4ca322513e908756d502a7488a5
** how are these evalueated again?**
Perhaps removing some influential data points will assist in creating a better model. Next we remove influential points that satisfy the criteria for Large Leverage, Large Cook's Distance and Large RStandard values.
new_dataframe = removeInfluential(model_eight, knee)
model_nine = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = new_dataframe)
evaluate(model_nine)
## Test Result
## 1 Shapiro Wilk pvalue 2.138818e-14
## 2 Breusch-Pagan pvalue 0.006017515
## 3 R Squared 0.2818728
## 4 Adj R Squared 0.25453
## 5 Cross Validation 16.75786
## 6 Large Hat Values 122
<<<<<<< HEAD
This does seem to be of moderate benefit in relation to Adjusted \(R^2\) and Cross Validation.
We will evaluate the same model one again after removing the identified large hat values.
remove_hat_dataframe = removeLargeHatValues(model_nine, new_dataframe)
model_ten = lm(KneeScore_Patient ~ . - Diagnosis - Race - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - TibialTraySize - FemoralComponentType, data = remove_hat_dataframe)
evaluate(model_ten)
## Test Result
## 1 Shapiro Wilk pvalue 1.502033e-13
## 2 Breusch-Pagan pvalue 0.0001416128
## 3 R Squared 0.2753799
## 4 Adj R Squared 0.2592862
## 5 Cross Validation 16.92845
## 6 Large Hat Values 94
<<<<<<< HEAD
Removing the large hat values decreased the Cross Validation score. Perhaps we can find a better model by running BIC on the modified dataframe.
n = length(resid(model_ten))
model_10_bic = step(model_ten, direction = "backward", k = log(n), trace = 0)
evaluate(model_10_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.298166e-14
## 2 Breusch-Pagan pvalue 1.347535e-06
## 3 R Squared 0.2491407
## 4 Adj R Squared 0.2470959
## 5 Cross Validation 16.91151
## 6 Large Hat Values 101
<<<<<<< HEAD
Despite the reduction in leveraged values the selected model is no better than preceeding models.
sample_idx_two = sample(1:nrow(remove_hat_dataframe), 1300)
train_data_two = remove_hat_dataframe[sample_idx_two, ]
test_data_two = remove_hat_dataframe[-sample_idx_two, ]
test_mod_10 = rmse(test_data_two$KneeScore_Patient, predict(model_ten, train_data_two))
train_mod_10 = rmse(train_data_two$KneeScore_Patient, predict(model_ten, train_data_two))
test_10_bic = rmse(test_data_two$KneeScore_Patient, predict(model_10_bic, train_data_two))
train_10_bic = rmse(train_data_two$KneeScore_Patient, predict(model_10_bic, train_data_two))
| Model | Train RMSE | Test RMSE | ||
|---|---|---|---|---|
Model 10 |
<<<<<<< HEAD
15.9629066 | 24.1098868 | ||
Model 10 BIC |
16.2865348 | 23.8727 | =======16.5482181 | 22.3038911 |
Model 10 BIC |
16.8429739 | 22.1294134 | >>>>>>> acc04d98d5bfe4ca322513e908756d502a7488a5
model = lm(KneeScore_Patient ~. , data = knee)
model_back_aic = step(model, direction = "backward", trace = 0)
model_for_aic = step(model, direction = "forward", trace = 0)
model_both_aic = step(model, direction = "both", trace = 0)
n = length(resid(model))
model_back_bic = step(model, direction = "backward", k = log(n), trace = 0)
model_for_bic = step(model, direction = "forward", k = log(n), trace = 0)
model_both_bic = step(model, direction = "both", k = log(n), trace = 0)
evaluate(model)
## Test Result
## 1 Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue 0.9539336
## 3 R Squared 0.3161338
## 4 Adj R Squared 0.2609288
## 5 Cross Validation 17.07365
## 6 Large Hat Values 178
<<<<<<< HEAD
evaluate(model_back_aic)
## Test Result
## 1 Shapiro Wilk pvalue 1.123084e-14
## 2 Breusch-Pagan pvalue 2.237347e-05
## 3 R Squared 0.2562632
## 4 Adj R Squared 0.2490128
## 5 Cross Validation 16.85518
## 6 Large Hat Values 147
<<<<<<< HEAD
evaluate(model_for_aic)
## Test Result
## 1 Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue 0.9539336
## 3 R Squared 0.3161338
## 4 Adj R Squared 0.2609288
## 5 Cross Validation 17.07365
## 6 Large Hat Values 178
<<<<<<< HEAD
evaluate(model_both_aic)
## Test Result
## 1 Shapiro Wilk pvalue 1.123084e-14
## 2 Breusch-Pagan pvalue 2.237347e-05
## 3 R Squared 0.2562632
## 4 Adj R Squared 0.2490128
## 5 Cross Validation 16.85518
## 6 Large Hat Values 147
<<<<<<< HEAD
## Adj.R.Squared: 0.24 - CrossValidation: 16.9
evaluate(model_back_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.934109e-15
## 2 Breusch-Pagan pvalue 7.743929e-07
## 3 R Squared 0.2423439
## 4 Adj R Squared 0.2404141
## 5 Cross Validation 16.91176
## 6 Large Hat Values 112
<<<<<<< HEAD
evaluate(model_for_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.941649e-14
## 2 Breusch-Pagan pvalue 0.9539336
## 3 R Squared 0.3161338
## 4 Adj R Squared 0.2609288
## 5 Cross Validation 17.07365
## 6 Large Hat Values 178
<<<<<<< HEAD
evaluate(model_both_bic)
## Test Result
## 1 Shapiro Wilk pvalue 1.934109e-15
## 2 Breusch-Pagan pvalue 7.743929e-07
## 3 R Squared 0.2423439
## 4 Adj R Squared 0.2404141
## 5 Cross Validation 16.91176
## 6 Large Hat Values 112
<<<<<<< HEAD
## Remove large leverages
new_data = removeInfluential(model, knee)
model_without_large_leverages = lm(KneeScore_Patient ~ ., data = new_data)
evaluate(model_without_large_leverages)
## Test Result
## 1 Shapiro Wilk pvalue 1.061121e-14
## 2 Breusch-Pagan pvalue 0.9760654
## 3 R Squared 0.3171425
## 4 Adj R Squared 0.2622432
## 5 Cross Validation 16.88624
## 6 Large Hat Values 174
<<<<<<< HEAD
Removing the records with large hat values and refitting simply introduces more large hat values. What is unique about these rows of data?
which(hatvalues(model) == 1.0)
## 17 19 74 80 240 244 258 354 400 533 638 703 763 876 881
## 17 19 74 80 240 244 258 354 400 533 638 703 763 876 881
## 891 1047 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927
## 891 1047 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927
=======

Removing the records with large hat values and refitting simply introduces more large hat values. What is unique about these rows of data?
which(hatvalues(model) == 1.0)
## 17 19 74 80 240 244 258 354 400 533 638 703 763 844 876
## 17 19 74 80 240 244 258 354 400 533 638 703 763 844 876
## 881 891 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927
## 881 891 1093 1096 1245 1374 1406 1544 1554 1686 1775 1778 1781 1919 1927
>>>>>>> acc04d98d5bfe4ca322513e908756d502a7488a5
## 1938
## 1938
print(knee[17,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race Side
## 17 3 2010 68 Female 148 64 25.43 Osteoarthritis White Left
## KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 17 48 50 No DePuy (J&J)
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 17 Nonporous 2 cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 17 PFC Sigma 2.5 PFC Sigma 8
## TibialInsertType PatellaModel PatellaDiameter
## 17 curved 3-Post Round Dome 32
print(knee[240,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race
## 240 3 2010 66 Female 130 68 19.78 Osteoarthritis White
## Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 240 Bilateral 60 80 No Zimmer
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 240 Zimmer Precoat E - cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 240 Nexgen 4 Nexgen 10
## TibialInsertType PatellaModel PatellaDiameter
## 240 standard NexGen 32
print(knee[638,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race Side
## 638 3 2012 67 Female 167 62 30.57 Osteoarthritis White Right
## KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 638 55 55 Yes DePuy (J&J)
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 638 Sigma 4N cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 638 PFC Sigma 3 PFC Sigma 8
## TibialInsertType PatellaModel PatellaDiameter
## 638 high flex,Fixed bearing Triathlon Asymmetric Patella 35
print(knee[1093,])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race
## 1093 3 2013 83 Female 130 63 23.05 Osteoarthritis White
## Side KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 1093 Right 66 50 No DePuy (J&J)
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1093 Sigma 3 posterior stabilized
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1093 PFC Sigma 2.5 PFC Sigma 8
## TibialInsertType PatellaModel PatellaDiameter
## 1093 posterior stabilized,curved Oval Dome Patella 3-peg 35
print(knee[1775, ])
## Surgeon Year Age Gender Weight Height BMI Diagnosis Race Side
## 1775 3 2014 22 Male 240 77 28.49 Post-trauma White Right
## KneeScore_Surgeon KneeScore_Patient GPS Manufacturer
## 1775 32 65 No Zimmer
## FemoralComponentModel FemoralComponentSize FemoralComponentType
## 1775 Persona 12 cruciate retaining
## TibialTrayModel TibialTraySize TibialInsertModel TibialInsertWidth
## 1775 Persona H Persona 10
## TibialInsertType PatellaModel PatellaDiameter
## 1775 Fixed bearing Persona 38
Each of these has a suspect factor entry. Perhaps the entry only occurs infrequently in the database and is causing the entire row to be flagged as high leverage? Try pulling the suspect columns out of the LM and see what happens.
model_nine = lm(KneeScore_Patient ~ . - TibialInsertType - FemoralComponentModel - PatellaModel - TibialTrayModel - FemoralComponentType, data = knee)
evaluate(model_nine)
## Test Result
## 1 Shapiro Wilk pvalue 2.75403e-14
## 2 Breusch-Pagan pvalue 0.08531339
## 3 R Squared 0.2986236
## 4 Adj R Squared 0.259491
## 5 Cross Validation 16.96213
## 6 Large Hat Values 137
<<<<<<< HEAD

=======

>>>>>>> acc04d98d5bfe4ca322513e908756d502a7488a5